Search Result

Select

Case reading comprehension method combining syntactic guidance and character attention mechanism

HE Zhenghai, XIAN Yantuan, WANG Meng, YU Zhengtao

Journal of Computer Applications 2021, 41 (8): 2427-2431. DOI: 10.11772/j.issn.1001-9081.2020101568

Abstract （488）

PDF （813KB）（566）

Save

Case reading comprehension is the specific application of machine reading comprehension in judicial field. Case reading comprehension is one of the important applications of judicial intelligence, which reads the judgment documents by computer and answers the related questions. At present, the mainstream method of machine reading comprehension is to use deep learning model to encode the text words and obtain vector representation of the text. The core problem of model construction is how to obtain the semantic representation of the text and how to match the questions with the context. Considering that syntactic information is helpful for model learning the sentence skeleton information and Chinese characters have potential semantic information, a case reading comprehension method that integrates syntactic guidance and character attention mechanism was proposed. By fusing the syntactic information and Chinese character information, the coding ability of the model for the case text was improved. Experimental results on the reading comprehension dataset of Law Research Cup 2019 show that compared with the baseline model, the proposed method has the Exact Match (EM) value increased by 0.816 and the F1 value improved by 1.809%.

Reference | Related Articles | Metrics

Select

Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model

JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang

Journal of Computer Applications 2021, 41 (6): 1652-1658. DOI: 10.11772/j.issn.1001-9081.2020071017

Abstract （332）

PDF （1333KB）（303）

Save

Neural machine translation achieves good translation results on resource-rich languages, but due to data scarcity, it performs poorly on low-resource language pairs such as Chinese-Vietnamese. At present, one of the most effective ways to alleviate this problem is to use existing resources to generate pseudo-parallel data. Considering the availability of monolingual data, based on the back-translation method, firstly the language model trained by a large amount of monolingual data was fused with the neural machine translation model. Then, the language features were integrated into the language model in the back-translation process to generate more standardized and better quality pseudo-parallel data. Finally, the generated corpus was added to the original small-scale corpus to train the final translation model. Experimental results on the Chinese-Vietnamese translation tasks show that compared with the ordinary back-translation methods, the Chinese-Vietnamese neural machine translation has the BiLingual Evaluation Understudy (BLEU) value improved by 1.41 percentage points by fusing the pseudo-parallel data generated by the language model.

Reference | Related Articles | Metrics

Select

Chinese-Vietnamese news topic discovery method based on cross-language neural topic model

YANG Weiya, YU Zhengtao, GAO Shengxiang, SONG Ran

Journal of Computer Applications 2021, 41 (10): 2879-2884. DOI: 10.11772/j.issn.1001-9081.2020122054

Abstract （317）

PDF （758KB）（188）

Save

In Chinese-Vietnamese cross-language news topic discovery task, the Chinese-Vietnamese parallel corpora are rare, it is difficult to train high-quality bilingual word embedding, and the news text is generally long, so that the method of bilingual word embedding is difficult to represent the text well. In order to solve the problems, a Chinese-Vietnamese news topic discovery method based on Cross-Language Neural Topic Model (CL-NTM) was proposed. In the method, the news topic information was used to represent news text, and the bilingual semantic alignment was converted into bilingual topic alignment tasks. Firstly, the neural topic models based on the variational autoencoder were trained in Chinese and Vietnamese respectively to obtain the monolingual abstract representations of the topics. Then, a small-scale parallel corpus was used to map the bilingual topics into the same semantic space. Finally, the K-means method was used to cluster the bilingual topic representations for finding the topics of news event clusters. Experimental results show that, compared with the Improved Chinese-English Latent Dirichlet Allocation model (ICE-LDA), the proposed method increases the Macro-F1 value and topic-coherence by 4 percentage points and 7 percentage points respectively, showing that the proposed method can effectively improve the clustering effect and topic interpretability of news topics.

Reference | Related Articles | Metrics

Select

Chinese-Vietnamese bilingual multi-document news opinion sentence recognition based on sentence association graph

WANG Jian, TANG Shan, HUANG Yuxin, YU Zhengtao

Journal of Computer Applications 2020, 40 (10): 2845-2849. DOI: 10.11772/j.issn.1001-9081.2020020280

Abstract （350）

PDF （815KB）（398）

Save

The traditional opinion sentence recognition tasks mainly realize the classification by emotional features inside the sentence. In the task of cross-lingual multi-document opinion sentence recognition, the certain supporting function for opinion sentence recognition was provided by the association between sentences in different languages and documents. Therefore, a Chinese-Vietnamese bilingual multi-document news opinion sentence recognition method was proposed by combining Bi-directional Long Short Term Memory (Bi-LSTM) network framework and sentence association features. Firstly, emotional elements and event elements were extracted from the Chinese-Vietnamese bilingual sentences to construct the sentence association diagram, and the sentence association features were obtained by using TextRank algorithm. Secondly, the Chinese and Vietnamese news texts were encoded in the same semantic space based on the bilingual word embedding and Bi-LSTM. Finally, the opinion sentence recognition was realized by jointly considering the sentence coding features and semantic features. The theoretical analysis and simulation results show that integrating sentence association diagram can effectively improve the precision of multi-document opinion sentence recognition.

Reference | Related Articles | Metrics

Select

SMFCC: a novel feature extraction method for speech signal

WANG Haibin, YU Zhengtao, MAO Cunli, GUO Jianyi

Journal of Computer Applications 2016, 36 (6): 1735-1740. DOI: 10.11772/j.issn.1001-9081.2016.06.1735

Abstract （692）

PDF （874KB）（389）

Save

Aiming at the problems of effective feature extraction of speech signal and influence of noise in speaker recognition, a novel method called Mel Frequency Cepstral Coefficients based on S-transform (SMFCC) was proposed for speech feature extraction. The speech features were obtained which were based on traditional Mel Frequency Cepstral Coefficients (MFCC), employed the properties of two-dimensional Time-Frequency (TF) multiresolution in S-transform and effective denoising of two-dimensional TF matrix with Singular Value Decomposition (SVD) algorithm, and combined with other related statistic methods. Based on the TIMIT corpus, the extracted features were compared with the current features by the experiment. The Equal Error Rate (EER) and Minimum Detection Cost Function (MinDCF) of SMFCC were smaller than those of Linear Prediction Cepstral Coefficient (LPCC), MFCC, and LMFCC; especially, the EER and MinDCF08 of SMFCC were decreased by 3.6% and 17.9% respectively compared to MFCC.The experimental results show that the proposed method can eliminate the noise in the speech signal effectively and improve local speech signal feature resolution.

Reference | Related Articles | Metrics

Select

Recognition of Chinese news event correlation based on grey relational analysis

LIU Panpan, HONG Xudong, GUO Jianyi, YU Zhengtao, WEN Yonghua, CHEN Wei

Journal of Computer Applications 2016, 36 (2): 408-413. DOI: 10.11772/j.issn.1001-9081.2016.02.0408

Abstract （407）

PDF （895KB）（883）

Save

Concerning the low accuracy of identifying relevant Chinese events, a correlation recognition algorithm for Chinese news events based on Grey Relational Analysis (GRA) was proposed, which is a multiple factor analysis method. Firstly, three factors that affect the event correlation, including co-occurrence of triggers, shared nouns between events and the similarity of the event sentences, were proposed through analyzing the characteristics of Chinese news events. Secondly, the three factors were quantified and the influence weights of them were calculated. Finally, GRA was used to combine the three factors, and the GRA model between events was established to realize event correlation recognition. The experimental results show that the three factors for event correlation recognition are effective, and compared with the method only using one influence factor, the proposed algorithm improves the accuracy of event correlation recognition.

Reference | Related Articles | Metrics

Select

Unsupervised text sentiment transfer method based on generating prompt

HUANG Yuxin, XU Jialong, YU Zhengtao, HOU Shukai, ZHOU Jiaqi

Journal of Computer Applications DOI: 10.11772/j.issn.1001-9081.2023091302
Online available: 15 March 2024